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This paper proposes an extension to the classical 3D variational data assimilation approach by explicitly 
incorporating as a prior information, the transform-domain sparsity observed in a large class of geophysical 
signals. In particular, the proposed framework extends the maximum likelihood estimation of the analysis 

C/3 

state to the maximum a posteriori estimator, from a Bayesian perspective. The promise of the methodology 
Qh is demonstrated via application to a ID synthetic example. 

1 Introduction 

in 

Data Assimilation has played a central role in improving the forecasting ability of hydro-met eorologic, cli- 
matic and oceanic modeling systems. The basic idea is to consistently fuse the observations of the prominent 

state variables (e.g., wind, temperature, pressure) or physical states (e.g., cloud moisture, precipitation), 
^ iterativcly in time, into the knowledge of a Numerical Prediction Model (NPM) to reduce the estimation 

uncertainty of the state variables of interest. Data assimilation methods typically use the observations to 

H 

update the current a priori model estimates of the states {background) and produce an a posteriori state 
{analysis) to be used for prediction of the next time step {forecast). 

Data assimilation primarily stems from the least-squares estimation in the statistical sense. Basically, 
the available methods can be divided into two major categories. The first, is the group of recursive (least- 
squares) filtering methods which essentially exploit the temporal evolution of some statistical characteristics 
of the system (e.g., covariance) to efficiently track the optimal states sequentially in time (e.g., Kalman filter 
driven data assimilation methods). The second category, the so-called variational methods, relies on a batch 
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mode direct optimization at each instant of time when the observations become available (e.g., 3D or 4D 
variational approaches). We remark here that, these two apparently distinct approaches often share similar 
mathematical concepts and are quite equivalent in many cases; however, with different implementation 
strategies in practice. In this paper, we restrict our attention to the second group of assimilation methods 
and in particular the more primitive 3D variational (3D-VAR) formulation. For a thorough review of the 
historical evolution of the data assimilation techniques the reader is referred to Talagrand and Courtier 
(1987), Ghil and Malanotte-Rizzoli (1991), Daley (1993), Bouttier and Courtier (2002), Kalnay (2003), 
Zhou et al. (2006), Evensen (2007), and references therein. 

The classical variational data assimilation typically involves solving a smooth optimization problem in 
which the solution has a minimum weighted Euclidean distance to both observation and background estimates 
where the weights are dictated by the pair of model and observation error covariance matrices. From a 
statistical estimation point of view this procedure is equivalent to the Maximum Likelihood (ML) estimation 
of the unknown state in a Gaussian noise (error) environment. In this classical formulation no a priori 
assumption is explicitly taken into account about the underlying structure of the analysis state. 

Natural signals can typically be projected onto transform domains (e.g., Fourier, Discrete Cosine, Wavelet) 
in which a large fraction of the representation coefficients is very close to zero and only a few of them are 
significant, a signature typically referred to as "sparsity". For instance, the wavelet transform of piece-wise 
smooth natural signals with occasional rapid variations often translates to non-Gaussian heavy tail distribu- 
tion of the wavelet coefficients with a concentrated probability mass around zero. 

Here, we propose a new formalism for variational data assimilation which explicitly incorporates the 
underlying sparsity in the analysis state as an a priori knowledge. In a very simple example we demonstrate 
how this a priori knowledge can stabilize and make the computation of the analysis state more accurate 
compared to a classical solution. 

Section 2 is devoted to explaining the notation. In Section 3, we briefly review the preliminary concept 
of the 3D-VAR data assimilation scheme. In this setting, an elementary ID example in the Gaussian domain 
is presented to elaborate on the underlying assumptions and performance of the methodology in an ideal 
case. In Section 4, we provide evidence on the sparsity of some important geophysical signals. Exploiting 
the observed sparsity as an a priori knowledge, in Section 5, we cast the variational data assimilation 
in a Bayesian framework both in the spatial and wavelet domains. The promise of the methodology is 
demonstrated through an elementary constructed ID example in that section. Section 6, contains a brief 
discussion and points out to future research. 
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2 Notation 



We refer to a 2D signal X as a vector x, by stacking all the pixels in a fixed order. All vectors are column 
vectors and (•)^ indicates the transpose. For any vector x e K", Xi refers to its i*'* element, where i G 
{1, . . . ,n}. The same notation applies to a matrix operator H and its entries hi^i. The standard Zp-norm 
of X is denoted by ||x||p := Ixif)^^^, for p > 1, while the infinity norm is ||x||^ = max, \xi\. For 

p <1, ||x||p is no longer a norm and hence not convex; nevertheless, we will use the term norm in this case 
as well, keeping in mind this reservation. By weighted inner product, we denote (x, y)p := x^Py and hence 
the corresponding weighted Euclidean (quadratic) norm, is ||x||p = (x-^Px)^/^, where P >- is a symmetric 
positive definite matrix. In a linear transformation $ = [(j)i,(j>2, ■ ■ ■ , (f>m] € K"^™ with m > n (e.g., wavelet 
transform) , the columns (pi G M" denote the "atoms" whereby any certain class of signals x G M" can be well 
approximated by a linear combination of ^^'s, i.e., x = (pi^i = The vector x is said to exhibit a 
sparse representation in if the number of (significantly) non-zero elements of the representation coefficients 
c, is much smaller than the signal dimension. 

3 Variational Data Assimilation 

The theory of (recursive) least-squares estimation has been central to the development of the classical data 
assimilation methodologies. Let the true state of interest at time t be denoted by x{t) G M™, a noisy 
observation of the state by y{t) G ffi", and the background estimate of the state produced by a dynamical 
model of the underlying physics by Xb{t) G R™. We assume that the model can reproduce an unbiased but 
noisy estimate of the true state. In other words, it is assumed that the model can resolve the underlying 
physics in such a way that under a consistent perturbation of the input parameters and states, the ensemble 
average of the output tends to the tr^ie state as the number of ensemble members goes to the infinity, whilst 
the random deviation of each ensemble member can be well approximated by a Gaussian density. In a more 
formal setting we have two equations that relate the true state to the background state and observation as 
follows: 

Xb = X + w 

y=^(x) + v, (1) 

where w A/^(0, B), v ^ ^^(0, R) are uncorrelated Gaussian and the time index is dropped for brevity. 
Obviously the goal is now to obtain the so-called analysis state x^ e as the best estimate of the 
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true state, given the above pair of observation and the background state. From the 3D variational point of 
view, it amounts to obtaining the analysis state Xa which minimizes the sum of two quadratic cost functions, 
each of which quantifies the weighted Euclidean distance of the analysis to the background state Xf, and 
observation y; 

Xa = argmin|i||x6-x||B_i + ^\\y - n{x}\\l^-i^ , (2) 

where the weights are inverse of the error covariance matrices. For now, we assume that the measurement 
operator 7i{-) can be replaced with a linear time invariant H G jj^x™ operator. In the context of our study, 
the incremental formulation for nonlinear measurement operator, see (e.g.. Courtier et ai, 1994), which 
typically arises in direct assimilation of satellite radiance observations, will be briefly discussed in Section 5 

From a statistical estimation point of view, the variational form in equation (2) can be obtained through an 
ML estimator, xml — arg maxx p (y,Xb|x), where p (y,Xh|x) denotes the joint conditional density (Gaussian) 
of the observation and background state with respect to x. In this view, the cost function in equation (2) is 
equivalent to the negative of the log- likelihood function, assuming the background and observation vectors 
are independent, see (e.g., Bouttier and Courtier, 2002). Simple algebra and ignoring the constant terms 
in X yields a smooth quadratic cost function 

:7(x)=^x^ (B-i+H^R-iH) X- 

(B-ixb + H^R-V)^x, (3) 

where the analysis state is its potential unique minimizer, x^ = argmiux i7(x). Note that the cost function 
of equation (3) is strictly convex with unique global minimum provided that the Hessian V^i7(x) = B^^ + 
H"^R~^H is positive definite which requires that the measurement operator H be a full rank matrix, see 
Giver and Shakiban (2006, p. 160). This unique minimum can be obtained by setting the first order derivative 
to zero Vxi7 = 

X, = (B-i + H^R-iH)"' (B-ix6 + H^R-V) • (4) 

Through the Fisher information it can be shown that the obtained analysis in equation (4) can be an efficient 
(unbiased with minimum variance) estimator and its error covariance meets the Cramer-Rao lower bound, 
which is the inverse of the Hessian of i/(x), see (e.g., Bouttier and Courtier, 2002; Levy, 2008, p. 140). 

As is evident, the obtained closed form expressions are computationally prohibitive for large scale and 
ill-conditioned assimilation problems and typically first order iterative approaches (e.g., preconditioned con- 
jugate gradient) are required to efficiently compute the matrix inversions. One of the main advantages of 
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the variational formalism is its flexibility that for example the analysis state, constrained in a simple feasible 
and closed polyhedron (e.g.. 1 ^ x ^ u), can be obtained using the Fast Gradient Projection (FGP) methods 
developed for large scale quadratic programing problems (e.g., Nesterov, 1983; Serafini et at, 2005). 

Figure 1 shows the result of a 3D-VAR assimilation scheme applied to a stationary first order discrete 
Markovian process in M™, autoregressive (AR-1), where m — 64. A true signal (x) is generated and the 
background states and observations are obtained by adding Gaussian white noise with the signal-to-noise 
ratio (SNR) 8 dB and 10 dB respectively, where SNR — 20 log (o'x/o'noise)- Here, to simply resemble the 
uncaptured subgrid details, we have assimilated a coarse-scale observation signal with half of the size of the 
original signal. To this end, we first convolved the true signal with an average filter l/2[+l, +1], downsampled 
the smoothed observations by a factor of 2 and then added the white noise. Notice that in a matrix form, 
it suffices to define the observation operator H as a Toeplitz convolution matrix with hi i = hi^i i = 0.5, 
hi j — and then decimate the rows by a factor of 2. In other words, for each pair of two grid points in the 
model space there is only one observation node in the middle, which is assumed to be a noisy measurement 
of the mean of the true states on those grid points. 




Nodes 

Figure 1: Results of a 3D-Var assimilation in which the true signal (x) follows a stationary AR(1) discrete 
Gaussian process. Here we assumed Xi+i = yxi + ^1 — "f^Ui, where 7 — 0.8 and Ui ~ A/'(0, 1). As a result of 
the assimilation, the analysis exhibits an improved SNR = 11.20 dB. 

In the next section, we provide evidence on the non-Gaussian and sparse structure of the fluctuations of 
some important geophysical signals in terms of their wavelet coefficients. This property is completely ignored 
in the explained classical formulation of data assimilation and can serve as additional prior knowledge to 
constrain more and possibly enhance the accuracy of the assimilation results. 

4 Sparsity of Geophysical Signals 

Many natural signals exhibit a spatial organization of isolated high-intensity areas nested within less active 
larger-scale regions. This property often translates into a sparse representation, that is, a major portion of 
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Figure 2: Generalized Gaussian Density with unit standard deviation and various tail parameters. 

the signal can be projected onto (near) zero values under an appropriate transformation, while only a few 
significantly non-zero projection coefficients carry most of the signal energy. Motivated by Mallat (1989), 
Huang and Mumford (1999) and Wainwright et al. (2001) among many others, it has been recently shown by 
Ebtehaj and Foufoula-Georgiou (2011) and Ebtehaj et al. (2012) that precipitation reffectivity images exhibit 
a remarkably sparse representation in a redundant wavelet transform and the distribution of the wavelet 
coefficients can be well explained by the class of symmetric Generalized Gaussian Distributions (GGD). 
This family of densities p{x) cx exp (— |a;/s|") with a scale s and tail parameter a spans a wide range of 
exponentially bounded tail probabilities from a Dirac delta (a 0) to a uniform density (a oo) in limiting 
cases. The Gaussian {a = 2) and the Laplace {a — 1) densities are also two special cases of this family, see 
Figure 2. 

Figure 3 (upper panels) shows different geophysical signals ranging from very fast evolving dynamical 
processes such as precipitation and streamffow down to a landscape digital elevation map with a very slow 
evolving dynamics. Applying a Daubechies wavelet, histograms of the wavelet coefficients share relatively 
similar thick tail probability distribution, analogous to the GGD, whilst most of the values are (near) zero; see 
the lower panels of Figure 3. These observations imply that as an a priori knowledge, the wavelet coefficients 
(generalized ffuctuations) of these signals exhibit a sparse representation and can be well explained, at least 
in part, by the family of GGDs with the tail parameter commonly ranging in (0, 1]. Note that, this density 
is log-concave (i.e., the negative logarithm is a convex function) for a > 1, and hence the Laplace density 
(a = 1) is the best choice in this family that promotes sparsity while preserving a convex structure. Note 
that, the concept of sparsity is not restricted only to the wavelet coefficients of physical states with piece- wise 
smooth structure, such as the examples presented herein. Other (prominent) physical states with smooth 
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surfaces and trajectories may exhibit sparse representation in other transform domains such as the Fourier 
or Discrete Cosine Transform (DCT). 




Figure 3: Sparsity of some geophysical signals, top panel from left to right: (a) a level III NEXRAD rainfall 
reflectivity image in dBZ, over Texas on 1999/03/29 (20:13:00 UTC) at resolution ~ 1 x 1 km; (b) hillshade 
representation of high resolution lidar topographic data of a small watershed (2.8 km^ area) in the Oregon 
coast range near Coos Bay at resolution ~ 2 x 2 m; and (c) 40 years of daily streamflow signal (1948-1988) of 
Leaf river basin at Collins station (1944 km^ draining area), Mississippi. The bottom panels from left to right 
(d)-to-(f), show the corresponding probability histograms of the standardized wavelet coefficients c or say the 
generalized fluctuations of the above images in a probability scale. The fitted tail parameter (a) of the CCD 
is shown on the top right corner of the lower panel plots. 



5 Assimilation with Sparse Priors 

Having informative a priori knowledge about the distribution of the analysis state can serve to further 
constrain the assimilation problem and lead to an improved a posteriori estimate of the analysis from a 
Bayesian point of view. By definition, the Maximum a posteriori (MAP) estimator of the analysis state is 

x+ = argmax p(x|x6,y) . (5) 

X 

Applying the Bayes theorem and taking the logarithm, one can obtain 

x+ = arg min { J^(x) - log p(x)} , (6) 
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here i7(x) is the negative log-hkehhood function of the classical 3D-VAR, as previously explained. Note 
that, for a non-informative log-prior or say uniform density for p(x), this expression is exactly equivalent to 
the cost function of the ML estimator in equation (3). 

A general model for the prior distribution, often referred to as the Gibbs prior, is given by 

p(x) oc exp{-Ar(x)} , (7) 

where A > is a scaling parameter and T : M™ — > M is a functional mapping from the state space to a real 
number, see (e.g., Elad et ai, 2007). It follows from equations (6) and (7) that, the MAP estimator of the 
analysis state is 

x+ = arg min { J7(x) + AT (x)} , (8) 

X 

where the non-negative A acts as a trade-off parameter and plays an important role in the solution of the data 
assimilation problem. Naturally, small A weakens the effect of sparse prior and turns the problem into the 
classical least squares one, while larger A values, promote a more sparse solution. In this paper, to exploit the 
underlying sparsity, we suggest two particular choices of the transformation function T(x) which yields to a 
sparse-promoting reformulation of the variational data assimilation both in the wavelet and spatial domains. 

5.1 Linear Measurement Operator 
5.1.1 Wavelet Domain (W3D-VAR) 

Let us assume that the analysis state has a projection onto a redundant wavelet transform x = $c, where 
the columns of 4> G jjmxfe contain the wavelet "atoms", while c e M''" is the representation coefficients. As 
is evident, this matrix multiplication is equivalent to an inverse wavelet transform while another matrix 
Vj> ^ jj/cxm represents the forward wavelet transform for obtaining the wavelet coefficients, c = 'S'x. Given 
that, the wavelet coefficients of the analysis state exhibit a sparse structure and can be well explained by 
independent GGD distributions, a relevant choice for the functional mapping T(-) in the Gibbs prior can 
take the following form 

r(x) = Sr(^,x,f = ||*x||^ = ||c||^. (9) 

Choosing the closest convex representation of T(x) (i.e., p = 1), which is equivalent to assuming a Laplace 
prior for the wavelet coefficients, it follows that the 3D-VAR can be recast in the wavelet domain as 

c+ =argmin{^($c)+A||c||J, (10) 

C 
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where c+ denotes the analysis wavelet coefficients that can be used to reconstruct the analysis state in the 
physical state space via the inverse wavelet transform, x+ ~ ^c+. Note that both matrix multiplications, 
$c and "Jfx, can be performed very efficiently by the existing fast wavelet transforms, such as the orthogonal 
wavelet transform (i.e., = I ) (e.g., Mallat, 1989). It turns out that, due to its shift invariance property, 
the class of undecimated wavelet transforms is often preferred in this context, over the traditional discrete 
orthogonal wavelet transform (e.g., Coifman and Donoho, 1995). Notice that, assuming p — 2 ii\ equation 
(9) refers to a Gaussian prior for the wavelet coefficients and resembles the so called Tikhonov regularization 
in solving inverse problems. In this case, equation (10) has a closed form solution and this choice of prior is 
typically very suitable for smooth states. 

5.1.2 Spatial Domain (TV3D-VAR) 

Following the existence of a sparse structure in the wavelet coefficients or say generalized fluctuations of 
a geophysical signal x, another choice for the functional mapping T(x) in the Gibbs prior, is the Total 
Variation (TV) semi-norm of x, which leads to obtaining the analysis as 

x+ = argmm{J^(x) + A ||x||tv} . (11) 

Two popular choices for the discrete TV semi-norm (e.g., Rudin et al, 1992; Beck and Teboulle, 2009b) are: 
the isotropic one 

" / 

i=l 

and the Zi-based 

n 

1=1 

where, V hXi and V^a;^ are horizontal and vertical first order differences at pixel i, respectively. Note that 
obtaining the optimal solution of the TV3D-VAR cost function is more involved than the W3D-VAR as the 
TV semi-norm is not a separable functional. 

5.2 Nonlinear Measurement Operator 

By first order linearization of the measurement operator in equation (2) and a change of variable 5x = x — x;,, 
the classical 3D-Var in an incremental form is typically reformulated as (Courtier et al, 1994), 

J(5x) = ^ |l<5x|l^_, + ^ \\Sy - H<5x||^_, , (14) 



9 



where Sy = y — 'H(xf,), and here H is a suitable linear approximation (e.g., Jacobian) of 'H(x) in a small 
neighborhood around xt,. Finding Sxa as the minimizer of equation (14). the analysis can be obtained by, 
Xa = X5 + (5xa. Having the sparse prior assumption on the transformed increments of the analysis state, it 
naturally leads to a possible choice for an incremental MAP estimator as follows: 

6x+ = arg min {J{Sk) + A || *<5x|| , } , (15) 

X 

where, here 'S' is referred to an invertible transformation (e.g., wavelet) that sparsifies the increments. We 
again emphasize the fact that the above proposed formulation seems intuitively suitable for states with sparse 
increments under a proper transformation. Of course, complementary case studies and further empirical 
evidence are required for a thorough conclusion about the proper selection of the transformation. 

Notice that, although the proposed data assimilation formalism in equations (10), (11) and (15) is convex, 
the prior terms are not differentiable and hence the cost function is non-smooth. In this case classical (first 
order) gradient based methods are no longer applicable. Several optimization techniques have been recently 
proposed to deal with large-scale non-smooth convex cost functions similar to that in equation (8), where 
J' is smooth and T is a non-smooth convex function. A large effort has been devoted on using efficient 
interior point algorithms for this particular type of cost functions in large scale problems (e.g., Goldfarb and 
Yin, 2005; Kim et ai, 2007). However, very recently, accelerated proximal gradient methods have received 
significant attention due to their fast convergence rate and simplicity (e.g., Nesterov, 2007; Figueiredo et al, 
2007; Bioucas-Dias and Figueiredo, 2007; Beck and Teboulle, 2009a, b). 

5.3 Non-Gaussian Error 

All of the presented formulations so far have been focused on the fact that the model v e M'" and mea- 
surement w G M" error terms can be well explained by a multivariate Gaussian distribution as the most 
dominant error probability model. Sometimes the distribution of the error is symmetric with tails markedly 
thicker than the Gaussian case, analogous to the GGD family. In this case, it can be shown that naturally 
the ML estimator in equation (2) is. 



minimize 



B-V^ (xb - x) ''^ + R-V^ (y - H(x)) . (16) 



P2 



P2 



Notice that for pi — 2, i £ {1,2}, the problem in equation (16) is equivalent to the classical 3D-VAR cost 
function and is convex for all Pi > I. As explained before, it can be concluded that the Laplace model for the 
error is the thickest tail probability that can be considered while preserving convexity of the cost function. 
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Figure 4: Demonstration of the promise of the 3D-VAR with sparse prior for a ID example. From top-to- 
bottom, (a) the true signal x, (b) the noisy background xj, (c) the low-resolution and noisy observation y, 
(d) the obtained analysis with prior x^ , and (e) the analysis Xa that resulted from the classical 3D-VAR. 



Obviously, the above ML estimator in a Generalized Gaussian noise environment can be further extended to 
the MAP estimator by adding the prior term, as previously explained. 



6 A ID Synthetic Example 

In this section, by no means we intend to solve a real data assimilation problem but only to demonstrate the 
promise of the proposed formulations and specifically the role of the prior on the "analysis phase". To this 
end, we focused on a very simple piece-wise constant ID example with an extremely sparse structure in its 
first order differences, see Figure 4. As is evident, the first order differences of this signal exhibit a marked 
sparsity with only four non-zero elements at the jump discontinuities. Notice that, in this case the wavelet 
Zi-based and the explained TV-based priors become analogous provided that the wavelet dictionaries contain 
the Haar "atoms". To simplify and be more instructive, we have chosen a fist order differencing operator for 
obtaining a sparse representation. 

In this case, the 3D-VAR with sparse prior can be recast in the following simple form 

=argminU(x)-KA||Dx||J, (17) 

X 

where, D G M™^™ is the first order differencing operator with di^i — 1, di^i-i — —1 and dij — 0. To obtain 



11 



the analysis in equation (17) here we follow a quadratic reformulation of the problem, as studied by Chen 
et al. (1998) and Figueiredo et al. (2007) and references therein. To this end, by a change of variable z — Dx 
one can obtain 

z„ = argmm{:7(D-iz) + A||z||J, (18) 

where z G M™ can be split as z = u — v, with Ui ~ max (z^, 0) and Vi = max (— z,;, 0). Accordingly, the h- 
prior term can be written in a linear form as ||z|| ^ = l^^u+l^^v, where l.,„ = [1,1,... 1]"^ G M™. Augmenting 
u and V in w = [u v]^, equation (18) can be recast in the following constrained quadratic programming 
(QP) 

... It 

mmimize -w 
w 2 

subject to w :>= 0, (19) 

where, A > 0, C = D"^ (B"! + H'^R-iH) D"! and b = -D"'^ (B-^Xf, + H^R- V) • Through sub- 
differential analysis of equation (18), it can be shown that for A > ||b||^ the unique minimum of equation 
(19) is a zero vector with maximum possible sparsity. Here, we adopt A = 0.1||b||^ as suggested by Kim 
et al. (2007). 

The example signal x G M^^^ {a^ ~ 0.65) is a composition of two rectangular step functions. The 
background signal x^ G M.^'^^ in Figure (4b) is generated via adding a white Gaussian noise (cr^, ~ 0.15, 
SNR ~ 12.88 dB). Here, we assimilated a low-resolution and noisy version of the true signal as the observation 
y G M®^ into the background signal. To this end, the observation operator H G M^"'^^^^ is properly designed 
as explained in Section 2 and then a Gaussian white noise (cy = 0.10, SNR ~ 13.40 dB) is added to the 
downgraded version of the true signal, see Figure (4c). In this example, we used the Gradient Projection 
with backtracking line search (i.e., Armijo Rule ) for solving the constrained QP in equation (19), see 
(Bertsekas, 1999, p. 230). Furthermore, after obtaining z^, we optionally recalculated the magnitude of its 
nonzero elements by solely minimizing i7(D~^z), the least squares part of equation (18), constrained to the 
support set of z^, S ~ supp(za). In other words, we assumed that the zero elements of Zq are fixed and 
then calculated the magnitude of its non-zero elements, that is C^b, where C5 is a sub-matrix that contains 
those columns of C associated to the support set S and (•)^ is the Pseudo-inverse. 

The results in Figure 4 show the remarkable role of the sparse promoting prior on the quality of the 
estimated analysis state. It is clear that the estimation quality metrics have been slightly improved by 
solving a classical 3D-VAR assimilation problem; however, it led to over-fitted estimation. The result of the 
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new proposed formulation is close to an exact solution in this simple example and seems very promising 
by outperforming the classical 3D-VAR with more than two orders of magnitude in the SNR, which is a 
logarithmic metric. These results demonstrate that the error in the observation and background signal has 
been well suppressed while the discontinuities of the signal are also well recovered due to the incorporation 
of the prior. 

7 Discussion and Conclusion 

We introduced a new formalism for the variational data assimilation which takes into account a priori 
knowledge about the underlying statistical structure of the state in a transformed domain and showed the 
preliminary promise of the proposed methodology through a synthetic ID example. Although the formulation 
is presented for a 3D-VAR setting, it can be extended to a 4D-VAR context. In general, we can argue that 
the proper selection of the prior term typically yields a better error (noise) suppression, while the underlying 
structure of the state (e.g., discontinuities) can also be preserved. Although the focus of this study has been 
on sparsity of the state fluctuations and wavelet coeflicients, the role of the prior can be extended to other 
transformed domains such as the Fourier or DCT which are intuitively more suitable for smooth physical 
states. 

Study of the efficient proximal gradient methods for full scale and ill-conditioned data assimilation prob- 
lems with non-smooth prior can be of particular interest for future research in exploring the advantages of 
the proposed formulations in environmental predictability. The proposed W3D-VAR and TV3D-VAR are 
nonlinear estimators and hence, estimation of the analysis error covariance is a challenge and needs to be 
thoroughly investigated. As closed form expressions are not readily available for the covariance of these 
nonlinear estimators, randomization (e.g., bootstrapping) via ensemble techniques seems a viable approach 
for further study in this respect. 
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